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Abstract —Approximate Bayesian computation (ABC) using a sequential Monte Carlo method provides a comprehensive platform 
for parameter estimation, model selection and sensitivity analysis in differential equations. However, this method, like other Monte 
Carlo methods, incurs a significant computational cost as it requires explicit numerical integration of differential equations to carry out 
inference. In this paper we propose a novel method for circumventing the requirement of explicit integration by using derivatives of 
Gaussian processes to smooth the observations from which parameters are estimated. We evaluate our methods using synthetic data 
generated from model biological systems described by ordinary and delay differential equations. Upon comparing the performance of 
our method to existing ABC techniques, we demonstrate that it produces comparably reliable parameter estimates at a significantly 
reduced execution time. 
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1 Introduction 

HE time evolution of the variables modelled in 
a variety of science and engineering branches are 
often described by ordinary differential equations that 
are characterised by model structure - the functions of 
the dynamical variables - and model parameters. The 
task of estimating these parameters from experimental 
observations is thus of paramount importance. It is also 
necessary in some cases to choose the most appropriate 
among competing models that describe the observations. 
For parameter estimation and model selection, statistical 
and pattern recognition techniques built upon a Bayesian 
framework have been shown to work extremely well 
for complex non-linear ordinary differential equations in 

mm,® and g). 

To apply Bayesian techniques we need to integrate 
marginal likelihoods, which can be computationally in¬ 
tractable in non-linear differential equation models. For 
this reason some form of approximation such as Monte 
Carlo integration is generally preferred for parameter 
inference. Approximate Bayesian computation (ABC) 
based on sequential Monte Carlo (SMC) is one such 
approximate inference technique that has been applied 
to different classes of dynamical systems described by 
deterministic or stochastic differential equations for both 
parameter estimation and model selection in 0. The 
ABC-SMC algorithm has been shown to work well for 
the examples considered in [5]. ABC-SMC produces 
reliable estimates of parameters and has been used to 
discriminate between a set of candidate models using 
Bayesian model selection criteria. Moreover, ABC-SMC 
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enables the calculation of parameter sensitivities 0. 

ABC methods prove to be most useful for large models 
with complex likelihood surfaces that are difficult to 
evaluate. The operating principle of ABC methods lies in 
replacing the evaluation of likelihoods with a simulation 
based procedure for inference, by using a generative 
model A ig with parameters 9 drawn from a prior distri¬ 
bution it(Q) to simulate observations Y s ~ Aig that are 
compared with the observed data Y d . If the likelihood 
p(Y d \Aig) of observed data Y d is intractable or infeasible 
to compute, then we can use the ABC algorithm to obtain 
samples from the following modified posterior density 

Pe (9,Y s \Y d ) = 

t(A(Y d ,Y s )<e)(Y s ~Aig)n(9) (1) 

f g J Y . l(A(F d ,F s ) < e){Y s ~ Mg)n(9)d9dY^ 

where e > 0 is a tolerance level, A is a distance 
function, 1 is the indicator function and p t {9 ,Y s \Y d ) = 
p(9,Y s \A(Y d ,Y s ) < e). A good (enough) approximation 
of the true marginal posterior distribution is obtained 
when the distance A(Y d ,Y s ) is within a predetermined 
small tolerance e, i.e., 

Pe(9\Y d )= f P e{9,Y s \Y d )dY s m p(9\Y d ) (2) 

Jy s 

Since ABC (including ABC-SMC) requires the generation 
of a number of simulated observations Y s ~ Ain, the 
generation of observations could be a computationally 
expensive process. Thus although ABC-SMC mitigates 
the intractability of evaluating the likelihood function 
through simulation, repeated simulation from complex 
models for inference can itself be burdensome. For the 
case of dynamical systems such simulations require 
explicit numerical solutions of non-linear differential 
equations. Thus, despite its attractive features the ABC- 
SMC algorithm suffers from a major drawback rooted 
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in its computational burden for inference in differen¬ 
tial equations. In particular, the acceptance criterion 
li (A(Y d , Y s ) < e) can lead to the generation of many 
unused simulations Y s ~ Mg, and various methods 
have been proposed in 0 to improve the acceptance 
rate and reduce the run-time of the algorithm. 

In this paper we propose an alternate method of 
speeding up the ABC-SMC algorithm for parameter es¬ 
timation in deterministic models described by ordinary 
differential equations (ODE) or delay differential equa¬ 
tions (DDE) by reducing the time incurred in simulation. 
We achieve this speedup by: (i) completely circumvent¬ 
ing the process of integrating the differential equation by 
operating on the derivative space and (ii) by smoothing 
the derivatives using Gaussian processes (GP). It should 
be noted that using Gaussian processes as functional 
emulators in the derivative space, as a concept, has been 
proposed in flj,|2] for speeding up parameter estimation 
in deterministic differential equations. For parameter 
estimation, GP-based gradient matching has been used 
for ODEs and DDEs using a population Monte Carlo 
sampling 0; an adaptive variant of this approach is pro¬ 
posed for ODEs EJ. See El for a review and comparison 
between these approaches. The novelty of our proposed 
method is the fusion of GP regression with ABC-SMC. 
Our algorithm for fast parameter estimation can be 
easily incorporated into methods for model selection 
and recovering parameter sensitivities for deterministic 
differential equations. 

This paper is organised as follows: in section [2] we 
will introduce the ABC-SMC algorithm for parameter 
estimation in differential equations. In section [3] we will 
show how the algorithm can be sped up by circumvent¬ 
ing the need for the numerical solution of the differential 
equations. In section [4] we will introduce Gaussian pro¬ 
cesses for function estimation and subsequently in sec¬ 
tion [5] we will use GPs within the ABC-SMC to operate 
on the derivative space. In section [6] we will compare 
the performance of our proposed modification with the 
ABC-SMC algorithm 0 and also with its improved 
variant proposed in (6j|. In conclusion, we summarise our 
achievements in this paper and give some indication of 
future work in section [7] 

2 The BASIC ABC FRAMEWORK 

ABC methods generally have the following algorithmic 
form: 

1) Sample a candidate parameter vector 9 from a prior 
distribution n(9) and for each 9 ~ i t(9), simulate a 
dataset Y s ~ Mg from a generative model A ig. 

2) Compute a distance A (Y s ,Y d ) between the simu¬ 
lated dataset, Y s and the experimental data Y d . If 
A (Y d ,Y s ) < e, where e > 0 is the error tolerance 
of accepted solutions, then accept 9 and reject 
otherwise. 

These scheme is repeated until N parameter values are 
accepted, which represent a sample from the approxi¬ 


mate posterior distribution p e (9\Y d ). Exact posterior can 
be obtained from this scheme when e = 0. 

2.1 ABC-SMC for parameter estimation in ODE 

If the prior distribution is very different from the pos¬ 
terior distribution, the basic ABC framework is very 
inefficient as it spends a considerable amount of time 
sampling from areas of low likelihood in parameter 
space, which makes the acceptance rate extremely low. In 
order to improve upon poor acceptance rates and facili¬ 
tate exploration of the parameter space, ABC algorithms 
based on the SMC sampling method were proposed 
in 0, 0 and j9j and sequential importance sampling 
(SIS) in 0, IflOl . 0 applied the ABC algorithm based 
on SIS for parameter estimation and model selection 
for a variety of dynamical systems including non-linear 
ODEs and DDEs, which will also be the focus in this 
paper. Although all the variants of ABC algorithms that 
come under the SMC category can potentially be used 
for inference in dynamical systems, we will specifically 
focus our attention on the ABC approach as adopted in 
0- 

We shall apply ABC-SMC to models of the evolution 
of state X(t) = (Xi(t),.. . , Xk (f) that are governed 
by ODEs or DDEs = f(X(t-t d ),9), where t d 

stands for the time delay in DDEs, with t d = 0 for 
ODEs, and 0 is a vector of parameter values. We ex¬ 
press the integrated solution of the differential equations 
X(t,Xi n ;0) as a map ip t (Xi n ;9) that generates state 
trajectories X (t) given a set of parameters 9 and initial 
conditions X; n = X(t d < t < 0). To generate the samples 
Y s , we add Gaussian noise t] ~ AC 0. <j 2 1k), where I k is 
a K x K identity matrix to the solutions X(t,X i„; 0) to 
create the generative model Mg: 

(Y s ~Mg)^Y s = X(t,X in ;0). (3) 

to be used in the ABC framework. 

A collection of parameter values, called particles 9 are 
sampled from the prior tt(0) to instantiate the generative 
model Mg. To decide whether a particular choice of 
parameters 9 ~ tt(9) is accepted, we need to compare if 
the simulated trajectory Y s ~ Mg is within a tolerance 
level e of the observed trajectory Y d , for which we 
introduce the distance function 

L K 

A(Y d , Y s ) =Y^Y. { yk{ti)-X k {t i )-r lk )\ (4) 

2—1 k =1 

where we assumed that the data were collected, and the 
state evaluated, at integer time points t L = {h}*=i ■■■A- 
Note that for dynamical systems such distance functions 
are generally built by considering the entire time-series 
data instead of some sufficient statistics. 

The sequential stage of this algorithm involves replac¬ 
ing a single tolerance value e by a sequence of tolerance 
values e T/ where r = 0,..., Smc denotes the sequential 
steps, and e T > e T+i . The particles 0 r are indexed by 
t labelling the tolerance level, and are sampled from 
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the posterior distribution obtained from the previous 
sequential step, thus introducing a step-wise procedure 
for generating parameters from a sequence of more 
informative distributions, starting at r = 0 with the 
prior distribution 7r(0). To accept or reject the sampled 
particles for sequence index r, the generated trajectories 
from the model with parameters 0 T must be closer to 
the observed data Y d than those generated from the 
model with parameters 0 T -i in step r — 1. The gener¬ 
ative mechanism for the particles 0 T differs from that 
of 0 T - 1 in that they are sampled from the N particles 
,tv with importance weights (5j w T -\ and 
each 6* ~ {0 i ~ > \ }»=i,...,iv is perturbed by a perturbation 
kernel K T {B\6*) 0. 

For each r = 0 ,..., Smc , the N particles meeting the 
acceptance criterion A (Y s ,Y d ) < e T represent a point- 
wise approximation for the posterior distribution over 
the parameter values: 

1 N 

Per (e\Y d )K>-Y,wP6{e-oW), (5) 

V 2=1 

where is the importance weight of the particle i 0 
and 8(0 — 0 </l> ) is a product of Dirac delta functions, one 
for each component of 0. For a small value of (s MC 
the final collection of particles should be a good point- 
wise approximation to the true posterior distribution. 
The ABC-SMC algorithm is listed in Algorithm 1. 

One way of speeding up the ABC-SMC algorithm is 
by increasing the acceptance rate. To this end a range 
of perturbation kernels were proposed in 0 that result 
in a noticeable change in terms of acceptance rates 
and run-time. However, the major computational burden 
stems from the numerical integration of the differential 
equation and thus a faster simulation method within the 
ABC-SMC is likely to speed up this algorithm possibly 
more than what is gained by clever choice of perturba¬ 
tion kernels. 

3 Gradient based parameter estimation 

IN DIFFERENTIAL EQUATIONS 

We have mentioned previously that the computational 
bottleneck stems from the explicit integration carried out 
in each simulation step. In order to avoid the integration 
one could essentially use a gradient based estimation. 
If the temporal variations in observations Y d (t) are 
believed to be less smooth than the underlying state 
evolution that is modelled by differential equations, we 
shall introduce the target state variable X(t) to be the 
smoothed version X(t) = S(Y d ) of the observations. 
Here, S represents any smoothing procedure, and we 
will use Gaussian Process (GP) regression to perform 
the smoothing below. In the ABC framework, we shall 
accept the trajectories X(t) from the model Me (see (0) 
if they are close to X(t). Once we have X(t) we can 
compute its numerical derivative to obtain the empirical 


Algorithm 1 ABC-SMC as proposed in 0 


1. 

Given Y d , n(0), Me- 

2. 

Initialise e T 

> 0, r = 0, ..., Smc, e T > e T +i- Set r = 0. 

3. 

Set i = 1. 


4. 

if r = 0 then 

5. 

sample 0 

"* independently from n(0): 


0** ~ n(0) 

6. 

else 


7. 

from the 

previous population {0 7 -i}i=i,...,N 


sample 0 

* ~ .tv with associated nor- 


malized weights w*_ x and use the perturbation 


kernel K 7 

(0\0*) to produce 0** ~ K T (0\0*). 

8. 

end if 


9. 

if 7T(0**) = 

0 then 

10. 

go to 4. 


11. 

else 


12. 

Simulate 

a candidate dataset Y s from the model 


Me with 

parameter 0**: Y s ~ Me\e<-o**- 

13. 

end if 


14. 

if A(F d ,y s 

) > e r then 

15. 

go to 4. 


16. 

else 


17. 

Set 0y •<— 0** and calculate the weight for particle 


0 { T l >, 




r 1 , if t = 0 


w? = 

„ , ifT >0 


1 

Y'W^Krie? |0^r) 
l j =1 

18. 

end if 


19. 

if i < N then 

20. 

Set 

T 1 and go to 4. 

21. 

else 

22. 

Normalise the weights. 

23. 

end if 


24. 

if t < Smc 

then 

25. 

Set t r 

+ 1 and go to 3. 

26. 

else 

27. 

return particles 9g at r Smc- 

28. 

end if 



vector field V d (t) of the dynamical system Me'. 

for X(t) 4 S(Y d ), V d (t) 4 ±X(t). (6) 

In addition, while the left hand side of the ODE 
■^X(t) = f(X(t),0) is estimated by the empirical 

derivative V d (t), it should be matched by the model 
vector field f(X(t),0) on the right hand side, when 
evaluated on the smoothed state data f(X(t) = X(t),0). 

Upon introducing a new distance measure between 
V d (t) obtained from the smoothing and f(X(t), 0) ob¬ 
tained from the vector field we can eliminate the original 
distance metric for ABC-SMC between observed, Y d , 
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and the simulated, Y s ~ A ie, trajectories, thus unbur¬ 
dening ABC-SMC of ODE integration at each simulation 
step. The gradient based method was first suggested 
in m where a spline-based smoothing was used to 
denoise the observed data. In this method a cost func¬ 
tion was built using the distance metric in derivative 
space and optimisation was used to minimise this cost 
function in order to obtain point estimates. Recent de¬ 
velopments of this methods are described in IflZl . All 
these approaches suffer from similar problems of using 
additional regularisation parameters for smoothing and 
often the estimates are sub-optimal point estimates. Al¬ 
though porting the derivative based distance within an 
ABC scheme alleviates the computational bottleneck, this 
approach suffers from an inherent shortcoming that is 
rooted in obtaining a numerical derivative as this might 
lead to information loss. 

In our approach, we replace the numerical differenti¬ 
ation with a zero mean Gaussian Process (GP) prior on 
the state X (t) given by 

p(X(t)|0)~SP(O,lT(f,f^)), (7) 

where denotes a covariance function with 

hyperparameters </>. Once such a prior is established then 
Gaussian Process regression techniques can be applied to 
estimate both the state vector X (t) and also the deriva¬ 
tive process V d (t). Using GP regression the derivative 
process can be inferred within a probabilistic framework. 
Hence, we propose to use a distance function in the 
derivative space where the state X(t) and derivative 
V d (t) is modelled using GP regression/within the ABC- 
SMC algorithm. In this way our proposed method is 
based on the GP construction in the derivative space as 
in JT) and (3), combined with the ABC-SMC algorithm 
as proposed in J5). Next, we will briefly introduce GP re¬ 
gression and will apply this to the ABC-SMC algorithm. 

4 Gaussian processes 

For real-valued functions / : A —> M. of one or more 
input variables defined over A, a Gaussian process (GP) 
is a Bayesian non-parametric model that specifies a 
distribution P(f) Ifl3l . Ifl4| , flS] , Ifl6l characterised by a 
mean function /i (x) and a covariance function or kernel, 
K{x, x<fi): 

f(x) ~ gV(n(x),K(x,x';</))), (8) 

where 4> are hyperparameters. For example, the squared 
exponential covariance function, which we use below, is 
given by 

( 1 (x - x') 2 \ 

Kse{x,x') = o-fce rn exp ( -- /2 J , (9) 

with hyperparameters a\ ern and l 2 (variance and char¬ 
acteristic length-scale). 

For a finite number (n) of inputs x* = (x-\ __ x „), 

Xi £ A and for f(x) ~ QTip{x), K{x, x'\ 4>)), the n- 
dimensional vector of function values evaluated at n 


points f(x*) = (f{xi),...,f(x n )) is a random vector 
drawn from a multivariate Gaussian distribution: 

p(f(x*)\x*) = N{y{x*),K{x*,x'*)). (10) 

For performing regression, observations y(x) = 
(y(xi),... ,i/(xl) at L training points x = {x\,...,X l) 
are fit to function / evaluated at x: 

y{x) = fix) + y, (ii) 

as in 10. Given the training data (, y{xi)), i = 1,..., L 
the conditional predictive distribution of the function 
f* = fix*) evaluated at the test points x* is a Gaussian 
with mean and variance given by [ 16) 

E[f*\y,x,x*,<i>} = p(/*)+ 

[K(x*, x)(K(x, x) + <t 2 I) -1 

x(y-M(/*))]> 

Var[f*\y,x,x*,$\ = K{x*,x*)~ 

[K(x*,x)(K(x, x) + (j 2 I) -1 
x K{x ,a:*)], 

where $ = {<j>, a}. 

The hyperparameters $ are inferred as point estimates 
by optimising the logarithm of the marginal likelihood, 
with a zero mean assumption (I6ll : 

logp(y|$) = log J p(y\f ,$)p(f\$)df 

= — ^y T {K{x 1 x) + cr 2 I) -1 y (13) 

~ lo S \ K (x, x ) + <t 2 I| ^ j log(27r). 


4.1 Derivative Gaussian processes 

Since differentiation is a linear operator, the derivative of 
a Gaussian process is another Gaussian process |[T7| . This 
makes it possible to include derivative observations in 
the GP model, or to compute predictions about deriva¬ 
tives. We have 


E 


'dm 

dx 


dE[fjx)} 

dx 


(14) 


And likewise the covariance between partial derivative 
and a function value can be written as 


K {^Sr ,f{x * ) ) = i; K{x ’ x * ) ’ (15) 


and the covariance between partial derivatives follows 


For example considering the squared exponential covari¬ 
ance function given in 0, we can write the covariance 
between partial derivative and a function value as 


K 



(x 


-X*) 

l 2 


K{x, x*) 


(17) 
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The conditional distribution of the derivative of a test 
function ^ having observed the output (a;, y) = 
(x, f(x)) of a noisy function / is given as 




(18) 


where we have the mean function m and covariance 
function S (obtained by using 11 41 and ( fl5l l and con¬ 
sidering the prior mean of the test and training function 
to be zero) given as 


m = 


£ = 


dK ( x *, x) 
dx* 
d 2 K ( x * 


\K(x, x) + <7 2 l] 1 f(x), 


X*) 


dx*dx* 
dK (x*,x) 


dx* 


\K(x, x) + ct 2 I] 


-i dK (x, x*) 
dx* 


(19) 


5 ABC-SMC with Derivative GP 

In this section we apply the machinery reviewed in the 
previous sections to the task of inferring parameters in 
differential equation models whose solution is the state 
trajectory X(t). If we assign a GP prior to the state 
evaluated at time points t L = then the set 

of values of the state X(t L ) takes on a Gaussian prior 
distribution: 

p(X(t L )\t L ) = AT(X{t L )\0,K(t L ,t L )). (20) 

The modelling task is to represent the experimental data 
as 10 Y d = {X{t L ) + r) L } where r/ L refers to L i.i.d. 
samples from J\f(0, <r 2 I) and X (t) satisfies a differential 
equation. We can use GP regression to obtain the expec¬ 
tation and variance of the posterior (given training data 
Y d at t L ) state X(t*) for some test input time point t* 
as in Section U O: 

E[X(t*)\Y d ] = K(t* ,t L )(K (t L ,t L ) + a 2 I)~ 1 Y d , 

Var[X*} = K(t*, i *) - K(t *, t) ( K(t , t) + a 2 l)~ x K(t, t*). 

( 21 ) 

This expected posterior state variable X(t*) for arbitrary 
choice of t* models the smoothed evolution of the state 
X(t) introduced above, and where it is assumed that 
observational noise accounts for deviations from the 
smoothed time course. The smoothed state estimation 
enables us to compute the velocity field using the deriva¬ 
tive GP (as in Section [4~Tb : 

E [j t X} = ^^-(K(t,t)+a 2 I)- lE [X}. ( 22 ) 

This completes the procedure for deriving the empirical 
velocity field © V c (t) = E[^X]. 

To apply this derivative process within the ABC 
framework we need to define a distance metric 
A (V d (t), f (X(t), 0)) between the smoothed velocity 
field derived from the observed data, and the velocity 
field postulated in a differential equation model, where 
the expected state estimation X(t) has been substituted 


for the state variable. Hence our proposed fast alterna¬ 
tive ABC-SMC based on GP gradient distance (GP-ABC- 
SMC) works as follows: 

1) Having given data Y d as a noisy observation of 
the true state variable X(t), assign a GP prior on 
X(t) using 0 and choose a covariance function, 
with some unknown hyperparameters, needed to 
define the GP prior. 

2) Learn the hyperparameters of the covariance func¬ 
tion from the original noisy experimental data Y d 
using maximum likelihood estimation and then 
run GP regression to obtain an estimation of the 
smoothed state evolution X(t) = E[X) using (12Tb 
and the experimental time points t as both the 
training and test input points. 

3) Construct the first derivative of the covariance 
matrix and estimate the derivative process V d (t) = 
E \ft X \x=5c\ usin § EO- 

4) Run the ABC-SMC algorithm with a modified dis¬ 
tance metric A (V d (t) 1 f(X(t),0)) < e T for toler¬ 
ance schedule {e T }, where at each simulation step 
the simulated data Y s = f(X(t). 0) is generated 
by evaluating the velocity field on the right hand 
side of the differential equation. This yields the 
posterior distribution of the parameters p(6\Y d ) 0. 

Note that no explicit solution of differential equation 
is required to generate the simulated data within the 
iterations of the GP-ABC-SMC algorithm. Also note the 
fact that, in order to run the GP-ABC-SMC algorithm, 
no knowledge of the initial condition is required. 

For the sake of conformity with the ABC terminolo¬ 
gies, we will persist in using the terms Y d for observed 
data and Y s for simulated data, as before. However, 
within the context of GP-ABC-SMC algorithm, observed 
and simulated data refer to V d (t) and f(X (/). 6) respec¬ 
tively. That is what Y d and Y s will refer to for the rest 
of the paper. 

5.1 Algorithmic settings 

The success of ABC-SMC algorithm both in terms of 
computational complexity and quality of the solution 
depends on the choice of the e T schedule and the pertur¬ 
bation kernel K T . In this section we will briefly describe 
how we have chosen these two algorithmic settings. 
A detailed discussion concerning the effects of these 
settings can be found in (63. 

5.1.1 Tolerance schedule 

Until recently, tolerance values were manually tuned in 
practice based on prior empirical knowledge about the 
model. An adaptive choice of the tolerance values has 
been proposed in |il"8| and fl9|. In an adaptive schedule 
the value of the tolerance e T is chosen as the a-th 
quantile, where 0 < a < 1 of the distances between the 
observed data Y d and simulated data Y*_ } generated at 
the previous algorithmic time. 
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5.1.2 Perturbation kernel 

Perturbation kernels hold the key to the acceptance rates 
in ABC-SMC and the speed of the algorithm as exploited 
in Perturbation kernels can be broadly divided into 
two categories: a component-wise perturbation kernel 
and a multivariate perturbation kernel. In a component¬ 
wise perturbation kernel 0 ~ A f(0, £ T ) where £ r is a 
diagonal covariance matrix whose diagonal entries cr^ ■ 
j = 1 ,... ,d are chosen adaptively according to the 
previous population labelled by r — 1 flOl . l20| , f6Tl . 

A component-wise perturbation kernel is, by con¬ 
struction, unable to generate particles with correlated 
components; therefore, for models with strongly corre¬ 
lated parameters the ABC-SMC sample generator will 
not be able to reflect the structure of the posterior 
and the acceptance rate will be low. Thus, in order to 
capture such correlations the particles can be perturbed 
according to a multivariate normal distribution with a 
non-diagonal covariance matrix £ r that depends on the 
covariance of the particles as reflected in the population 
in the previous sequential step (r — 1) |6]. Furthermore, 
a multivariate perturbation kernel operating on a subset 
of size N' of the N particles (a local kernel) was also 
shown |6] to produce a noticeable improvement in the 
acceptance rate. In order to define this kernel we will 
introduce some notation. Let Y/^ denote the simulated 
data generated from JVig with particle 9 <— O- 1 , i = 
1 ,... ,N from a population of size N at algorithmic time 
r. The corresponding importance weights are denoted as 
Wt 1 ■ We collect all such particles (along with the weights) 
from algorithmic time r — 1 for which Y r s ^ is not only 
within distance e T _i of the observed data Y d but also 
within distance e T of it. We denote such particles as 4-,: 

{^r} <ni = {^|A(Y d , Y t s «) < e T , 1 < * < n} , 

(23) 

with associated normalised weights 4-i = (4-i/H 
with w = 4-i- 

Having defined the pairs ^4-i; 4-i) we can now 

use a multivariate normal distribution N ’(4-i)^v) > 
with a local covariance Y, l T (termed the optimal local 
covariance in (6j), to perturb a particle 4-i, where local 
refers to particle i. This covariance is given by 
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In the next section we will implement the GP-ABC- 
SMC algorithm to infer parameters of some standard 
non-linear differential equations through some toy exam¬ 
ples and in that process we will compare and contrast 
our GP gradient based approach to that of ABC-SMC 
algorithm with explicit integration. 


6 Evaluation of the algorithm on 

BENCHMARK MODELS 

To evaluate the GP-ABC-SMC algorithm we have chosen 
three benchmarking differential equations: The Lotka 
Volterra predator-prey model |2D . the Hesl loop model 
l22l and signal transduction cascade model |4],0. Each 
is a set of non-linear differential equations modelling 
biological systems and show non-trivial dynamical phe¬ 
nomenon such as limit cycle oscillations and non¬ 
stationary time evolution. For all these examples we 
have used the distance function given by {U and have 
run the ABC-SMC algorithm with explicit integration 
using a component-wise univariate normal kernel (ABC- 
SMC-Comp) (5j as well as a multivariate normal kernel 
with the optimal local covariance matrix (ABC-SMC- 
OLCM) (6). For our proposed GP-ABC-SMC we have 
also used both the aforementioned perturbation kernels. 
We refer these as the GP-ABC-SMC and GP-ABC-OLCM 
respectively. We believe a comparison between these four 
variants of ABC-SMC is required to capture the differ¬ 
ence in speed of execution between the GP based ABC- 
SMC and the previous approaches reported in [j5), |6l , 
while comparing posterior estimates of the parameters. 
For all the examples presented here, we ran all these 
variants of ABC-SMC, including the proposed GP based 
ones, with N = 100 particles using an adaptive tolerance 
schedule set to the a = 0.1 quantile of the distances 
in the previous populations. The ABC-SMC routines 
are written in MATLAB and for the GP regressions the 
GPML package l23l for MATLAB is used in the predator- 
prey and Hesl loop example. For the signal transduc¬ 
tion cascade model the GPMat toolbox for MATLAB 
https://github.com/SheffieldML/GPmat is used 
which has an implementation of the multi-layer per- 
ceptron (MLP) covariance kernel 0, required to handle 
the non-stationarity of some of the state variables. The 
explicit integrations are carried out using MATLAB's 
built in ODE and DDE solver routines. 

6.1 ODE: The predator prey model 

The Lotka Voltera m model depicts an ecological sys¬ 
tem that is used to describe the interaction between a 
predator and prey species. This ODE given by 

x = ax — xy 

■ R (25) 

y = pxy - y, 

shows limit cycle behaviour and has been used for 
benchmarking in [5j, (2). 9 = (a, 0) is the set of parame¬ 
ters and X(t) = (x(t),y(t)) is the state vector comprising 
the concentrations of the predator and the prey species 
respectively. To create a realistic dataset we generated 
11 uniformly spaced samples between the time interval 
(0 < t < 10) from the model with parameters 0 = (1,1) 
and added random Gaussian noise with zero mean and 
standard deviation a = 0.5 to each point. The initial 
values of the ODE for generating the synthetic data are 
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chosen as X(t = 0) = (1.0,0.5). In order to inspect the 
consistency of our proposed algorithm we created two 
more datasets obtained by adding two other realizations 
of the random noise to the ODE time courses. Thus we 
have three sets of artificial data (denoted as Dataset 1, 2 
and 3), each of which has been corrupted by Gaussian 
noise with zero mean and standard deviation a = 0.5 
and sampled separately. Note that the GP-ABC-SMC 
algorithm does not require the estimation of additional 
nuisance parameters related to the initial values. The 
time evolution of the state and its derivative is predicted 
through the GP regression as described in Section [5] We 
have used the squared exponential covariance function 
given by l|9} for the GP regression in this example. 

From the synthetic data we perform the task of pa¬ 
rameter inference using the four different variants of 
ABC-SMC discussed in the last section to compare their 
performance. Both a and /3 are chosen from uniform 
prior distributions U{— 10,10) in all cases. The number of 
algorithmic iterations, the value of Smc is set to Smc = 6 
for the ABC-SMC-Comp and ABC-SMC-OLCM while 
it is set to Smc = 5 for GP-ABC-SMC and GP-ABC- 
OLCM. The values differ because we have chosen these 
on the criteria of minimum number of adaptive iteration 
required for estimating a reliable posterior distribution. 
As the ABC-SMC with integration and the GP based 
variants operate on different spaces thus setting same 
values for Smc does not produce comparable results. 
The specific values of Smc for this and subsequent ex¬ 
amples are chosen on the basis of multiple trials of all the 
four ABC-SMC algorithm on each of the datasets. The 
resulting parameter estimates are listed in Table 1 and 
the evaluation of the performance in Table 2 (left). We 
show in Table 1 the mean with 95% confidence intervals 
of the last population of parameters, approximating the 
marginal posterior, for each variants of the ABC-SMC. 
Note that the run-time of the GP-ABC-SMC and the GP- 
ABC-OLCM algorithms is the sum of the run-time of the 
ABC and the GP regression (including the estimation of 
covariance hyperparameters). The value of a is estimated 
as part of the GP regression. These estimated values 
are a = {0.4752,0.8090}, a = {0.6219,0.3940} and 
a = {0.6432,0.4592} for the dataset 1, 2 and 3 respec¬ 
tively. The results show that the GP-ABC-SMC and the 
GP-ABC-OLCM are considerably faster than the other 
two variants while producing similar results in terms 
of the mean of the estimates. However, the variants of 
ABC-SMC with explicit integration produces narrower 
confidence intervals compared to the GP variants. Inter¬ 
estingly the GP variants generate comparatively fewer 
number of samples, as evident from Table 2 (left), in¬ 
dicating higher parameter sensitivities in the derivative 
space. Also note that the variants of ABC-SMC with a 
local multivariate kernel generate fewer particles com¬ 
pared to the ones with univariate perturbation kernels, 
reducing the run-times. Since dimensionality of the local 
covariance is very small (equal to the ODE parameter 
space) much less effort is required in its computation 


than generating simulated data, even in the derivative 
space. 

6.2 DDE: The Hesl model 

Our proposed algorithm is also able to estimate param¬ 
eters of delay differential equations. The Hesl model 
system is used in systems biology to provide a simplified 
account of the oscillatory behaviour of the concentrations 
(p(t),p(t)) of a species of mRNA and its corresponding 
protein. The model, introduced in 5221/ is described by 
the following delay differential equations: 

1 

fJ 1 + (p(t - t d )/p 0 ) n (26) 

V = I- 1 ~ P-pP, 

where the parameters p m and /i p are decay rates, po is 
the repression threshold, n is the Hill coefficient and t d is 
the time delay. We generated data from the above model 
with parameters p m = 0.03, p p = 0.03, po = 100 and 
t d = 25 and initial conditions p(to) = 3 and p(to) = 3 for 
the concentrations between the interval (0 < t < 300) 
with uniform spacing of At = 2 by numerically solving 
the DDE. n is fixed at a value of 5 If22l . We estimated 
the standard deviations a p = 6.0020 and a p = 121.7670 
of the generated data, for each of the concentrations p(t) 
and p(t). We then added noise, with standard deviation 
set to 0.1 times these estimated standard deviations <j p 
and cr p/ to the data to create the artificial datasets. As 
in the previous example we created three datasets in a 
similar fashion. 

For comparison of performance of the four methods 
in the parameter estimation task, we keep the same 
algorithmic settings, as well as the same covariance func¬ 
tion for the GP regression as in the previously example. 
Unlike the ODE case where our algorithm does not need 
to guess the initial state values, it does need a history 
function for X(t < 0) for DDEs in order to work. In most 
practical cases the initial history function is taken as a 
constant function. Thus in order to make our algorithm 
work, we shifted the first element of the estimated 
state evolution backward in time to create the history 
function. The four variants of ABC-SMC having the same 
settings as before, are applied to this artificial dataset. We 
chose uniform priors for each of the parameters: // m ~ 
U{- 2, 2), p p ~ U{- 2,2), Po ~ U{ 0, 200) and t d ~ ^(0, 50). 
The number of iterations are chosen as Smc = 14 
while running ABC-SMC-Comp and ABC-SMC-OLCM. 
For GP-ABC-SMC and GP-ABC-OLCM this is chosen 
as Smc = 9. As in the previous example these Smc 
values are found through multiple trials of each of the 
algorithms on these datasets and inspecting the quality 
of the posterior estimates. The results are listed in Table 
3 and Table 2 (right). In this example we see a huge 
speedup while using our proposed GP-ABC-SMC and 
GP-ABC-OLCM algorithms, demonstrating the benefits 
of this approach. As in the previous example we noticed 
higher acceptance rates (fewer generated particles) for 



TABLE 1 

Estimated parameters of the Lotka Volterra predator-prey model denoted by the mean of the particles of the final 
population and 95% confidence intervals for datasets 1-3 respectively in each row. 


Parameters 

True value 

ABC-SMC-Comp 

ABC-SMC-OLCM 

GP-ABC-SMC 

GP-ABC-OLCM 

a 

i 

1.0688 ±0.0021 
1.0389 ±0.0062 
1.0293 ±0.0157 

1.0722 ± 0.0032 
1.0305 ± 0.0054 
1.0421 ± 0.0073 

1.0373 ± 0.0080 
1.0843 ± 0.0092 
1.0103 ±0.0172 

1.0356 ± 0.0070 
1.0718 ±0.0102 
1.0144 ±0.0139 

p 

i 

0.9698 ± 0.0025 
0.9749 ± 0.0093 
1.0103 ±0.0206 

0.9763 ± 0.0033 
0.9966 ± 0.0082 
0.9924 ± 0.0095 

0.9567 ±0.0083 
0.9846 ± 0.0080 
0.9887 ±0.0154 

0.9540 ± 0.0067 
0.9760 ± 0.0083 
0.9906 ±0.0126 


TABLE 2 

Run-time and the ratio of total number of particles accepted to that of generated for the four ABC-SMC algorithms 
when applied to the three artificial datasets pertaining to the Lotka Volterra predator-prey model (left) and Hesl 
model (right). The values for run-time are rounded to nearest integers. 


Algorithms 

Run-time (seconds) 

Accept/Generate 

Algorithms 

Run-time (seconds) 

Accept/Generate 


397 

700/14737 


106980 

1300/763045 

ABC-SMC-Comp 

477 

600/12294 

ABC-SMC-Comp 

840330 

1600/8026943 


516 

500/13381 


201710 

1600/1656197 


184 

800/7846 


5496 

1300/31342 

ABC-SMC-OLCM 

221 

700/6369 

ABC-SMC-OLCM 

8399 

1500/50968 


212 

600/6086 


5999 

1400/36519 


25 

500/7650 


30 

1100/31439 

GP-ABC-SMC 

26 

500/7547 

GP-ABC-SMC 

38 

1000/38911 


26 

500/7642 


32 

1000/36521 


21 

500/4655 


18 

1000/7387 

GP-ABC-OLCM 

20 

500/4316 

GP-ABC-OLCM 

18 

1000/8183 


16 

500/3193 


17 

1000/7969 


TABLE 3 

Estimated parameters of the Hesl loop model. 


Parameters 

True value 

ABC-SMC-Comp 

ABC-SMC-OLCM 

GP-ABC-SMC 

GP-ABC-OLCM 

Pm 

0.03 

0.0307 ±2.1608 x 10" 4 
0.0341 ± 6.5563 x 10" 5 
0.0336 ±4.1649 x 10" 5 

0.0305 ± 2.9381 x 10~ 4 
0.0342 ± 4.3428 x 10~ 5 
0.0336 ± 8.5782 x 10“ 5 

0.0295 ± 1.3929 x 10" 4 
0.0304 ± 2.6599 x 10" 4 
0.0291 ± 1.8721 x 10 -4 

0.0293 ± 1.1127 x 10 -4 
0.0302 ±2.3190 x 10“ 4 
0.0292 ± 1.8657 x 10~ 4 

ij. P 

0.03 

0.0294 ± 2.0457 x 10 -4 
0.0267 ±4.4742 x 10“ 5 
0.0268 ±2.9233 x 10 -5 

0.0297 ±2.8397 x 10" 4 
0.0267 ± 2.9662 x 10“ 5 
0.0268 ± 5.9925 x 10“ 5 

0.0300 ± 3.8592 x lO' 0 
0.0300 ±4.1123 x 10“ 6 
0.0297 ±3.5370 x 10" 6 

0.0300 ± 2.8604 x 10-° 
0.0300 ± 3.0506 x 10" 6 
0.0297 ± 3.6808 x 10" 6 

PO 

100 

99.4130 ± 0.0537 
102.1872 ±0.0362 
101.2097 ±0.0279 

99.4518 ±0.0697 
102.2306 ± 0.0281 
101.2549 ±0.0495 

99.5991 ±0.2946 
100.8624 ± 0.2628 
100.0403 ± 0.2796 

99.6997 ±0.2058 
100.8624 ± 0.2302 
100.0593 ± 0.2459 

td 

100 

25.1318 ± 0.0109 
25.2317 ±0.081 
25.0730 ± 0.0055 

25.1580 ±0.0151 
25.2428 ± 0.0056 
25.0714 ±0.0107 

25.0496 ±0.1139 
25.9357 ± 0.2031 
25.3187 ±0.1414 

25.0502 ± 0.0871 
25.6215 ± 0.1589 
25.4469 ± 0.1519 


the GP variants of ABC-SMC. The noise is estimated as 
a^ = 6.8080, a v = 128.6910, cr M = 6.9280, a p = 123.2920 
and <r M = 6.1220, a p = 128.1290 for the dataset 1, 2 and 
3 respectively. Furthermore, from Table 3 it is apparent 
that although the means of the parameters have similar 
estimates, their corresponding confidence intervals are 
different between the proposed GP variants and the orig¬ 
inal ABC-SMC algorithms. However it should be consid¬ 
ered that for GP-ABC-SMC (with both the perturbation 
kernels) no knowledge of the initial history function is 
required. Thus in a practical setting we believe a GP 
based ABC-SMC algorithm is the optimal choice among 
these four methods for parameter estimation in DDEs. 


6.3 ABC variability 

Our proposed method comprises of two levels of ap¬ 
proximation, one induced through the GP regression and 
the other one resulting from the approximate inference 
scheme. Thus in order to check the robustness of our 
proposed algorithm we repeated the GP-ABC-SMC and 
GP-ABC-OLCM parameter inference steps for 50 runs 
on each of the three artificial datasets for both the Lotka 
Volterra and Hesl models. We used the same algorithmic 
settings and prior distributions as in the previous ex¬ 
amples. Fig. [0 and Fig. [2] summarizes the distributions 
of the sample mean and variance (corresponding to the 
final particle population for each run of of GP-ABC- 
SMC and GP-ABC-OLCM on the three artificial datasets) 
across all the 50 rims on the data from Lotka Volterra and 























9 


Hesl respectively. 

It is evident from Fig. [Tj that the GP-ABC-OLCM algo¬ 
rithm produces fewer outliers compared to the GP-ABC- 
SMC for both the mean and variance estimates. This can 
be attributed to the local moves in the parameter space 
caused by the multivariate (OLCM) perturbation kernel. 
Furthermore, note that the distributions of the variances 
resulting from each of the algorithms are skewed in a 
similar manner with the outliers located at the same 
direction (for both GP variants) from the median. Thus 
these outliers represent greater variance of posterior dis¬ 
tribution of the parameters. However, in more than 90% 
out of the 50 runs the moments for both the parameters 
lie within the inter-quantile range. 

In case of the Hesl model it is apparent from Fig. [2] 
that the distributions are less variable across multiple 
runs and variants of the algorithms. Moreover, in this 
case we notice that the distribution of the variances have 
very few outliers indicating greater accordance among 
the posteriors learnt after each run of the algorithms. 
However, it should be noted that an adaptive toler¬ 
ance schedule results in different (marginally) tolerance 
values for each new run of the ABC-SMC. Thus some 
amount of variabilty in the moments corresponding to 
different runs is attributed to the differing tolerances. 
Fig. [3] and Fig. [4] show the learnt state trajectories of 
the Lotka Volterra and Hesl model compared against 
the true state trajectories for each of the datasets. The 
true trajectories correspond to the true parameters and 
the reconstructed trajectories are generated by solving 
the Lotka Volterra ll25l l and Hesl (f26t model equations. 
While solving (numerically integrating) these differential 
equations the parameters are taken as the median of 
the parameters learnt by the GP-ABC-SMC algorithm 
considering all the 50 runs. The median value is consid¬ 
ered here to reflect the effect of variability (in parameter 
learning by the GP-ABC-SMC) in reconstructing the 
dynamics of the considered models. 


6.4 Signal transduction cascade 

We have, so far, used the benchmarking examples to 
compare our proposed GP-based ABC-SMC approach to 
others of that ilk that exist in the literature. In this exam¬ 
ple we will compare the parameter estimation results for 
the proposed GP based ABC-SMC with other (methods 
not falling under ABC) recent GP based approximate in¬ 
ference methods for parameter estimation in ODEs. For 
this purpose we have chosen the signal transduction cas¬ 
cade model [4|. Using this model, a comparison between 
the competing GP based approaches were reported in f3| . 
Thus evaluating the proposed GP based ABC-SMC algo¬ 
rithm on this model (with identical settings to those in 
|3|) will enable us to draw comparisons with these other 
methods. This model is described by a 5-dimensional 
coupled ODEs given by 


Lotka Volterra Summary of Mean over 50 Runs 
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(a) Distribution of the mean for Lotka Volterra across 50 
runs of GP-ABC-SMC and GP-ABC-OLCM. 

Lotka Volterra Summary of Variance over 50 Runs 
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(b) Distribution of the variance for Lotka Volterra across 
50 runs of GP-ABC-SMC and GP-ABC-OLCM. 


Fig. 1. The boxplots represent the distributions of the 
mean and variances (across 50 runs) of the final pop¬ 
ulation representing the marginal approximate posterior 
parameter distributions learnt by the GP-ABC-SMC and 
the GP-ABC-OLCM from the three artificial datasets of 
Lotka Volterra model. 


4 s\ 

dt 

d[S d ] 

dt 

d[R] 
dt 
d[RS] 
dt 

dt 


-fci[S]-fe[S][iq + fc 3 [i2S] 

h[S] 

-WI + MSSR 

k 2 [S][R] - k 3 [RS] - fc 4 [RS] 
V[Rpp\ 


k 4 [RS} - 


K m + [Rpp] 


(27) 


where 6 = (k\, k 2 , k 3 , k 3 . V. k m ) are the parameters 
of this model and X(t) = ([5], [S d ], [-R], [RS], [f? PP ]) 
are the concentrations of the state variables. Following 
m we generated data from the model between the 
time interval (0 < t < 100) with parameters 9 = 
(0.07,0.6,0.05,0.3,0.017,0.3) and initial values of the 
state variable [S'] = 1, [Sd] = 0, [R] = 1, [RS] = 0, 
[Rpp] = 0. We then sampled the data at time t L = 
{0,1,2,4,5, 7,10,15,20,30,40,50,60,80,100} and added 
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Hess 1 Summary of Mean over 50 Runs 



Ef 


fi T 
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(a) Distribution of the mean for Hesl across 50 runs of GP-ABC- 
SMC and GP-ABC-OLCM. 



(a) Trajectories of x(t). 



(b) Distribution of the variance for Hesl across 50 runs of GP- 
ABC-SMC and GP-ABC-OLCM. 



Fig. 2. Distributions of the mean and variances learnt by 
the GP-ABC-SMC and the GP-ABC-OLCM from the three 
artificial datasets of Hesl model. 


random noise with standard deviation ays\r °"S d > r]> 

a [RS}> a [R PP ] set to 0.1 for generating the synthetic data. 
For inferring parameters in this example we apply the 
GP-ABC-OLCM algorithm from our study with multiple 
runs, where we found this algorithm to provide a stable 
and fast inference mechanism. The non-stationarity in 
the time evolution of the state variables is captured by 
the MLP covariance function given by 


fc(M') = O-feernX 


2 . 

—asm 

7T 



crlt T t' + erg _ 

^T+ly/alW + ai + l 



where the kernel variance cr 2 ern , the neural network 
weight variance a , 2 , and the bias variance cr 2 are the hy¬ 
perparameters of the covariance function. The derivative 
of this kernel with respect to the input time t is given 
by 


where 


dk(t, t') 

vlern ^ 

(29) 

dt 

VI - Z 2 dt' 

Y — 

a 2 w t T t' + cr 2 

(30) 


y 

norm 


with Z norm = All 

the other algorithmic settings were kept the same. The 
prior distributions are chosen as k\ ~ W(0.05, 0.09), 


(b) Trajectories of y(t). 

Fig. 3. Reconstructed and true state trajectories of the 
Lotka Volterra model. The results corresponding to the 
three datasets (1, 2, 3) are shown using blue, red and 
magenta colours respectively. Reconstructed trajectories 
are represented as curves and observations as stars. The 
ground truth is the black (dashed and circled) curve. 


k 2 ~ li(0A, 0.8), k 3 ~ ^(0.03,0.07), k 4 ~ W(0.1,0.5), 
V ~ U (0.015,0.0195) and k m ~ W(0.1,0.5). In this 
example Smc is set to 3. 

The resulting parameter estimates are furnished in 
Table 4 along with the parameter estimates obtained 
from other GP based algorithms run on the same model. 
These algorithms are the GP-ODE method proposed in 
0/ the adaptive gradient matching (AGM) proposed in 
0 and the gradient matching (GM) proposed in (TJ. The 
posterior distributions of the parameters as learnt by the 
GP-ABC-OLCM is shown in Fig. [6] We have compared 
the true state trajectories with the reconstructed trajecto¬ 
ries in Fig. [5] We generated the reconstructed trajectories 
by solving ll27b using the mean of the final population 
of GP-ABC-OLCM, representing the marginal posterior 
densities of the parameters. The estimated values of the 
standard deviations are cr[s] = 0.0964, <7[s d ] = 0.0818, 
CT[ R ] = 0.0707, cr [ft s] = 0.0591 and a [Rpp ] = 0.0754. We 
avoided the comparison of run-time or acceptance rates 
as the GP-ABC-OLCM and other GP based algorithms 
depend on completely different approximate inference 
scheme. However, GP-ABC-OLCM is significantly faster 
than the other approaches. The GP-ABC-OLCM finishes 
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TABLE 4 

Estimated parameters of the signal transduction cascade by all the GP based approaches including the 
GP-ABC-OLCM. The estimates for GP-ODE, AGM and GM are taken from (3). 


Parameters True value GP-ABC-OLCM 


GP-ODE 


AGM 


GM 


fci 
k 2 
kz 
k4 
V 
km 


0.070 

0.6 

0.05 

0.3 

0.017 

0.3 


0.0708 ± 0.0086 
0.5806 ± 0.0706 
0.0480 ± 0.0074 
0.3439 ± 0.0659 
0.0170 ±0.0009 
0.3110 ±0.0774 


0.0747 ±0.0130 
0.6230 ±0.1246 
0.0530 ±0.0135 
0.2960 ±0.0281 
0.0177 ±0.0014 
0.4220 ± 0.0690 


0.0771 ±0.0130 
0.5460 ±0.1259 
0.0593 ±0.0111 
0.3750 ± 0.0999 
0.0172 ±0.0015 
0.4090 ±0.0911 


0.0762 ±0.0130 
0.5632 ±0.1256 
0.0594 ±0.0115 
0.3754 ±0.1051 
0.0173 ±0.0014 
0.4186 ± 0.0953 



(a) Trajectories of /r(t). 
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Fig. 4. Reconstructed and true state trajectories of the 
Hesl model. 


Fig. 6. Posterior distributions of the parameters of the 
signal transduction cascade model learnt by the proposed 
GP-ABC-OLCM algorithm. 


by ordinary and delay differential equations . We achieve 
this speed-up by circumventing the need to numeri¬ 
cally integrate the differential equations, a task that is 
repeatedly required in other ABC methods to generate 
samples from candidate models for comparison with the 
observed data. The key idea behind our method lies in 
building on (T|, fl2l and 13 to work directly with the 
vector field of the dynamical system, which we model 
using Gaussian process regression, and thus create a 
distance function in derivative space for use in the ABC- 
SMC algorithm as proposed in (5). Thus we proposed a 
modified ABC-SMC algorithm for parameter estimation 
in ODEs or DDEs. 


the estimation in around 20 seconds while the other 
methods were run for 30 minutes to obtain a properly 
mixed Markov chain. The ratio of the last two param¬ 
eters V/k rn 12] is a crucial quantity that determines 
the reconstruction accuracy. GP-ABC-OLCM is able to 
infer this quantity with the best (based on the estimated 
posterior means of V and k rn ) accuracy among all the 
GP based algorithms. 

7 Conclusion 

In this paper we have proposed a method that signif¬ 
icantly speeds up the task of parameter inference in 
comparison with state of the art methods that use ABC 
and SMC based approaches when applied to several 
benchmark dynamical system models that are described 


We benchmarked the benefits of this approach by eval¬ 
uating our proposed approach on toy problems where 
we observed a significant speed-up of the parameter 
estimation process. We also compared our proposed 
approach with other GP based methods proposed in 
recent literature and found that our proposed GP-ABC- 
SMC (with the local multivariate perturbation kernel) 
performs significantly faster to obtain similar estimates. 
Furthermore, improvements of ABC-SMC through per¬ 
turbation kernels as proposed in [6] and has been in¬ 
tegrated with our approach to obtain enhanced perfor¬ 
mance. Thus, our fast approximate inference process can 
accommodate the useful features of ABC-SMC (as shown 
i n El) such as model selection and sensitivity analysis. 
Our proposed approach is only limited by the ability 
of the Gaussian process regression in smoothing the 
observed time series data while retaining the essential 
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Fig. 5. Results of GP-ABC-OLCM for the signal transduction cascade (([5], [S d ], [i?], [RS], [i? pp ] in (a),(b),(c),(d) and 
(e) respectively). In all the plots observations are the black stars, the true state trajectory is the red (dashed) curve 
and the blue curve shows the reconstructed trajectory. We have also plotted the GP mean function as the magenta 
curve and GP variance as the shaded area. The reconstructed trajectory is generated by numerically integrating {27} 
with the parameters set to the mean of the posterior distribution estimated by the GP-ABC-OLCM algorithm. 


characteristics that are meant to be captured by the 
dynamical system model. Thus in those cases where 
smoothing the experimental data by GP regression in¬ 
troduces artefacts, the GP-ABC-SMC algorithm would 
produce poor parameter estimates. As future work we 
wish to extend the GP-ABC-SMC algorithm for stochas¬ 
tic differential equation by relating the GP regression 
technique with drift estimation technique in f24l . We 
will also include models with hidden variables, as the 
smoothing procedure on observed data can no longer 
provide complete information. 
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